Search CORE

260 research outputs found

Subset Sampling and Its Extensions

Author: Huang Jinchao
Wang Sibo
Publication venue
Publication date: 21/07/2023
Field of study

This paper studies the \emph{subset sampling} problem. The input is a set

\mathcal{S}

n

records together with a function

\textbf{p}

that assigns each record

v\in\mathcal{S}

a probability

\textbf{p}(v)

. A query returns a random subset

X

\mathcal{S}

, where each record

v\in\mathcal{S}

is sampled into

X

independently with probability

\textbf{p}(v)

. The goal is to store

\mathcal{S}

in a data structure to answer queries efficiently. If

\mathcal{S}

fits in memory, the problem is interesting when

\mathcal{S}

is dynamic. We develop a dynamic data structure with

\mathcal{O}(1+\mu_{\mathcal{S}})

expected \emph{query} time,

\mathcal{O}(n)

space and

\mathcal{O}(1)

amortized expected \emph{update}, \emph{insert} and \emph{delete} time, where

\mu_{\mathcal{S}}=\sum_{v\in\mathcal{S}}\textbf{p}(v)

. The query time and space are optimal. If

\mathcal{S}

does not fit in memory, the problem is difficult even if

\mathcal{S}

is static. Under this scenario, we present an I/O-efficient algorithm that answers a \emph{query} in

\mathcal{O}\left((\log^*_B n)/B+(\mu_\mathcal{S}/B)\log_{M/B} (n/B)\right)

amortized expected I/Os using

\mathcal{O}(n/B)

space, where

M

is the memory size,

B

is the block size and

\log^*_B n

is the number of iterative

\log_2(.)

operations we need to perform on

n

before going below

B

. In addition, when each record is associated with a real-valued key, we extend the \emph{subset sampling} problem to the \emph{range subset sampling} problem, in which we require that the keys of the sampled records fall within a specified input range

[a,b]

. For this extension, we provide a solution under the dynamic setting, with

\mathcal{O}(\log n+\mu_{\mathcal{S}\cap[a,b]})

expected \emph{query} time,

\mathcal{O}(n)

space and

\mathcal{O}(\log n)

amortized expected \emph{update}, \emph{insert} and \emph{delete} time.Comment: 17 page

arXiv.org e-Print Archive

Nuclear Matter and Neutron Stars from Relativistic Brueckner-Hartree-Fock Theory

Author: Tong Hui
Wang Chencan
Wang Sibo
Publication venue: 'American Astronomical Society'
Publication date: 27/10/2022
Field of study

The momentum and isospin dependence of the single-particle potential for the in-medium nucleon are the key quantities in the Relativistic Brueckner-Hartree-Fock (RBHF) theory. It depends on how to extract the scalar and the vector components of the single-particle potential inside nuclear matter. In contrast to the RBHF calculations in the Dirac space with the positive-energy states (PESs) only, the single-particle potential can be determined in a unique way by the RBHF theory together with the negative-energy states (NESs), i.e., the RBHF theory in the full Dirac space. The saturation properties of symmetric and asymmetric nuclear matter in the full Dirac space are systematically investigated based on the realistic Bonn nucleon-nucleon potentials. In order to further specify the importance of the calculations in the full Dirac space, the neutron star properties are investigated. The direct URCA process in neutron star cooling will happen at density

\rho_{\rm{DURCA}}=0.43,~0.48,~0.52

^{-3}

with the proton fractions

Y_{p,\rm{DURCA}}=0.13

. The radii of a

1.4M_\odot

neutron star are predicated as

R_{1.4M_\odot}=11.97,~12.13,~12.27

km, and their tidal deformabilities are

\Lambda_{1.4M_\odot}=376,~405,~433

for potential Bonn A, B, C. Comparing with the results obtained in the Dirac space with PESs only, full-Dirac-space RBHF calculation predicts the softest symmetry energy which would be more favored by the gravitational waves (GW) detection from GW170817. Furthermore, the results from full-Dirac-space RBHF theory are consistent with the recent astronomical observations of massive neutron stars and simultaneous mass-radius measurement

arXiv.org e-Print Archive

Offline Experience Replay for Continual Offline Reinforcement Learning

Author: Gai Sibo
He Li
Wang Donglin
Publication venue
Publication date: 23/05/2023
Field of study

The capability of continuously learning new skills via a sequence of pre-collected offline datasets is desired for an agent. However, consecutively learning a sequence of offline tasks likely leads to the catastrophic forgetting issue under resource-limited scenarios. In this paper, we formulate a new setting, continual offline reinforcement learning (CORL), where an agent learns a sequence of offline reinforcement learning tasks and pursues good performance on all learned tasks with a small replay buffer without exploring any of the environments of all the sequential tasks. For consistently learning on all sequential tasks, an agent requires acquiring new knowledge and meanwhile preserving old knowledge in an offline manner. To this end, we introduced continual learning algorithms and experimentally found experience replay (ER) to be the most suitable algorithm for the CORL problem. However, we observe that introducing ER into CORL encounters a new distribution shift problem: the mismatch between the experiences in the replay buffer and trajectories from the learned policy. To address such an issue, we propose a new model-based experience selection (MBES) scheme to build the replay buffer, where a transition model is learned to approximate the state distribution. This model is used to bridge the distribution bias between the replay buffer and the learned model by filtering the data from offline data that most closely resembles the learned model for storage. Moreover, in order to enhance the ability on learning new tasks, we retrofit the experience replay method with a new dual behavior cloning (DBC) architecture to avoid the disturbance of behavior-cloning loss on the Q-learning process. In general, we call our algorithm offline experience replay (OER). Extensive experiments demonstrate that our OER method outperforms SOTA baselines in widely-used Mujoco environments.Comment: 9 pages, 4 figure

arXiv.org e-Print Archive

Properties of $^{208}$ Pb predicted from the relativistic equation of state in the full Dirac space

Author: Gao Jing
Tong Hui
Wang Chencan
Wang Sibo
Publication venue: 'American Physical Society (APS)'
Publication date: 29/12/2022
Field of study

Relativistic Brueckner-Hartree-Fock (RBHF) theory in the full Dirac space allows one to determine uniquely the momentum dependence of scalar and vector components of the single-particle potentials. In order to extend this new method from nuclear matter to finite nuclei, as a first step, properties of

^{208}

Pb are explored by using the microscopic equation of state for asymmetric nuclear matter and a liquid droplet model. The neutron and proton density distributions, the binding energies, the neutron and proton radii, and the neutron skin thickness in

^{208}

Pb are calculated. In order to further compare the charge densities predicted from the RBHF theory in the full Dirac space with the experimental charge densities, the differential cross sections and the electric charge form factors in the elastic electron-nucleus scattering are obtained by using the phase-shift analysis method. The results from the RBHF theory are in good agreement with the experimental data. In addition, the uncertainty arising from variations of the surface term parameter

f_0

in the liquid droplet model is also discussed

arXiv.org e-Print Archive

PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs

Author: Du Xiaoyong
He Xiaodong
Liu Yu
Wang Sibo
Wei Zhewei
Wen Ji-Rong
Xiao Xiaokui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2019
Field of study

{\it SimRank} is a classic measure of the similarities of nodes in a graph. Given a node

u

in graph

G =(V, E)

, a {\em single-source SimRank query} returns the SimRank similarities

s(u, v)

between node

u

and each node

v \in V

. This type of queries has numerous applications in web search and social networks analysis, such as link prediction, web mining, and spam detection. Existing methods for single-source SimRank queries, however, incur query cost at least linear to the number of nodes

n

, which renders them inapplicable for real-time and interactive analysis. { This paper proposes \prsim, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries. \prsim uses an index of size

O(m)

, where

m

is the number of edges in the graph, and guarantees a query time that depends on the {\em reverse PageRank} distribution of the input graph. In particular, we prove that \prsim runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs. Based on the theoretical analysis, we show that the empirical query time of all existing SimRank algorithms also depends on the reverse PageRank distribution of the graph.} Finally, we present the first experimental study that evaluates the absolute errors of various SimRank algorithms on large graphs, and we show that \prsim outperforms the state of the art in terms of query time, accuracy, index size, and scalability.Comment: ACM SIGMOD 201

arXiv.org e-Print Archive

Crossref